A cryptocurrency (or crypto currency or crypto for short) is a digital asset designed to work as a medium of exchange wherein individual coin ownership records are stored in a ledger existing in a form of computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership.
The cryptocurrency market has been volatile from the very beginning, but the last couple of years have been a particularly wild ride for millions of investors around the world. Many have made millions on the big upswings, and yet many have lost large and small investments in the bursting bubbles and sudden market downturns.
In the given assignment we have been given consolidated financial information for the top 10 cryptocurrencies by market cap extracted from CoinMarketCap.com. We want to understand how the prices of these currencies have changed over the course of time and also we want to see the top 3 currencies change in volume over the period of 2016-2019. This will be done for people of two age groups of 17-35 and 60+.
Once the requirements are clearly understood, the data will then be loaded into python jupyter notebook using pandas library. Once data has been uploaded and viewable, we will check its data type to understand what kind of data we are dealing with. From the given set of data, we will extract the following columns:
1) Currency (Object) 2) Date (Datetime) 3) High (Float) 4) Volume (Int)
We will convert the data types to continuous variables so that we can create visualizations. We will also then check for missing data and deal with them accordingly. Once the data has been cleansed, we will now move towards visualization. In order to create the required visuals we will first create two more dataframes in order to meet the required visuals. We will create a dataframe consisting of Currency,Date and High for the first visual, and for the Second Data frame we will select Currency,Date and Volume for the second visual. These dataframes will be converted into pivot tables setting Date as index, Currency as column and High as value in first dataframe where as in the second dataframe, we will first filter the Date to show values from 2016-2019 and top 3 Currencies I.e Bitcoin, Ethereum and Tether and then convert the dataframe into a pivot table where Date is index, Currency is Column and Volume are values.
After creating the pivot tables we will create line plot and box plot using matplotlib for the first requirement. A line graph is commonly used to display change over time as a series of data points connected by straight line segments on two axes. It is simple and clearly shows the change in value over the course of time. Box plots are a type of graph that can help visually organize data. Once the box plot is graphed, you can display and compare distributions of data. These visuals are useful particularly in financial analysis and can cater both age groups. The colors used will be sober and visualy appealing to both age groups.
For the second requirement we will again create a line plot and a bar plot using matplotlib. The reason for line graphs is same as it was in the first requirement whereas A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. It is another simple representation and shows the change in volume over a course of time.
Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
%matplotlib inline
Reading the CSV File
df = pd.read_csv("D:\\Documents\\Assignment\\consolidated_coin_data.csv", delimiter=",")
print ('Data read into a pandas dataframe!')
Data read into a pandas dataframe!
df.head() #Calling first 5 rows
Currency | Date | Open | High | Low | Close | Volume | Market Cap | |
---|---|---|---|---|---|---|---|---|
0 | tezos | 4-Dec-19 | 1.29 | 1.32 | 1.25 | 1.25 | 46048752 | 824588509 |
1 | tezos | 3-Dec-19 | 1.24 | 1.32 | 1.21 | 1.29 | 41462224 | 853213342 |
2 | tezos | 2-Dec-19 | 1.25 | 1.26 | 1.20 | 1.24 | 27574097 | 817872179 |
3 | tezos | 1-Dec-19 | 1.33 | 1.34 | 1.25 | 1.25 | 24127567 | 828296390 |
4 | tezos | 30-Nov-19 | 1.31 | 1.37 | 1.31 | 1.33 | 28706667 | 879181680 |
df.dtypes #Checking Datatypes
Currency object Date object Open float64 High float64 Low float64 Close float64 Volume int64 Market Cap int64 dtype: object
df['Date'] = pd.to_datetime(df['Date'])
df[['Date']] = pd.DatetimeIndex(df['Date']).year #Converting Datatypes to years for better representation
df.dtypes #Checking updated column
Currency object Date int64 Open float64 High float64 Low float64 Close float64 Volume int64 Market Cap int64 dtype: object
df.describe() #Understanding the data
Date | Open | High | Low | Close | Volume | Market Cap | |
---|---|---|---|---|---|---|---|
count | 28944.000000 | 28944.000000 | 28944.000000 | 28944.000000 | 28944.000000 | 2.894400e+04 | 2.894400e+04 |
mean | 2016.111940 | 300.719748 | 309.832808 | 290.858372 | 300.947362 | 8.133058e+08 | 7.194826e+09 |
std | 1.920268 | 1373.884718 | 1416.598612 | 1325.072673 | 1374.461259 | 3.059516e+09 | 2.469322e+10 |
min | 2013.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 |
25% | 2014.000000 | 0.210000 | 0.210000 | 0.200000 | 0.210000 | 2.418700e+05 | 6.345143e+07 |
50% | 2016.000000 | 2.995000 | 3.090000 | 2.880000 | 2.980000 | 5.212684e+06 | 3.453673e+08 |
75% | 2018.000000 | 24.430000 | 25.530000 | 23.270000 | 24.430000 | 1.554764e+08 | 3.422403e+09 |
max | 2019.000000 | 19475.800000 | 20089.000000 | 18974.100000 | 19497.400000 | 5.350913e+10 | 3.265025e+11 |
df.info(verbose=False) #Information about the dataframe
<class 'pandas.core.frame.DataFrame'> RangeIndex: 28944 entries, 0 to 28943 Columns: 8 entries, Currency to Market Cap dtypes: float64(4), int64(3), object(1) memory usage: 1.8+ MB
Checking for Missing Data
missing_data = df.isnull()
missing_data.head(5)
Currency | Date | Open | High | Low | Close | Volume | Market Cap | |
---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False |
df.isnull().sum()
Currency 0 Date 0 Open 0 High 0 Low 0 Close 0 Volume 0 Market Cap 0 dtype: int64
df.corr() #Checking Correlation between columns
Date | Open | High | Low | Close | Volume | Market Cap | |
---|---|---|---|---|---|---|---|
Date | 1.000000 | 0.184136 | 0.183393 | 0.185257 | 0.184023 | 0.345563 | 0.268524 |
Open | 0.184136 | 1.000000 | 0.999268 | 0.998868 | 0.998551 | 0.560011 | 0.953660 |
High | 0.183393 | 0.999268 | 1.000000 | 0.998588 | 0.999403 | 0.561062 | 0.954377 |
Low | 0.185257 | 0.998868 | 0.998588 | 1.000000 | 0.999205 | 0.559677 | 0.954393 |
Close | 0.184023 | 0.998551 | 0.999403 | 0.999205 | 1.000000 | 0.560457 | 0.955012 |
Volume | 0.345563 | 0.560011 | 0.561062 | 0.559677 | 0.560457 | 1.000000 | 0.591818 |
Market Cap | 0.268524 | 0.953660 | 0.954377 | 0.954393 | 0.955012 | 0.591818 | 1.000000 |
To view the dimensions of the dataframe, we use the .shape parameter.
df.shape # size of dataframe (rows, columns)
(28944, 8)
df1 = df[['Currency','Date','High']] #creating new dataframe
df1 = df1.groupby(['Currency','Date'],as_index=False).max() #Using groupby function to create required pivot table
df1
Currency | Date | High | |
---|---|---|---|
0 | binance-coin | 2013 | 53.15 |
1 | binance-coin | 2014 | 32.06 |
2 | binance-coin | 2015 | 8.73 |
3 | binance-coin | 2016 | 5.95 |
4 | binance-coin | 2017 | 53.55 |
... | ... | ... | ... |
79 | xrp | 2015 | 0.02 |
80 | xrp | 2016 | 0.01 |
81 | xrp | 2017 | 2.85 |
82 | xrp | 2018 | 3.84 |
83 | xrp | 2019 | 0.51 |
84 rows × 3 columns
grouped_pivot = df1.pivot(index='Date',columns='Currency',values='High') #Creating Pivot Table
grouped_pivot
Currency | binance-coin | bitcoin | bitcoin-cash | bitcoin-sv | cardano | eos | ethereum | litecoin | stellar | tether | tezos | xrp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||
2013 | 53.15 | 1156.14 | 147.49 | 53.15 | 53.15 | 53.15 | 1156.14 | 53.15 | 53.15 | 147.49 | 53.15 | 147.49 |
2014 | 32.06 | 1017.12 | 0.03 | 32.06 | 32.06 | 32.06 | 1017.12 | 32.06 | 32.06 | 0.03 | 32.06 | 0.03 |
2015 | 8.73 | 495.56 | 1.22 | 8.73 | 0.01 | 8.73 | 320.43 | 8.73 | 0.01 | 1.22 | 0.01 | 0.02 |
2016 | 5.95 | 979.40 | 1.01 | 5.95 | 0.00 | 5.95 | 21.52 | 5.95 | 0.00 | 1.01 | 0.00 | 0.01 |
2017 | 53.55 | 20089.00 | 4355.62 | 53.55 | 0.78 | 53.55 | 881.94 | 375.29 | 0.39 | 1.21 | 12.19 | 2.85 |
2018 | 24.91 | 17712.40 | 3071.16 | 243.79 | 1.33 | 22.89 | 1432.88 | 323.11 | 0.94 | 1.07 | 7.55 | 3.84 |
2019 | 39.57 | 13796.49 | 522.09 | 255.88 | 0.11 | 8.59 | 361.40 | 146.43 | 0.16 | 1.06 | 1.83 | 0.51 |
plt.style.use(['bmh']) # bmh Styling used for Visulization
grouped_pivot.plot(kind='line',figsize=(10,5))
plt.title('Change in Price of Currencies',fontsize = 20)
plt.ylabel('Price',fontsize = 15)
plt.xlabel('Years',fontsize = 15)
plt.legend(fontsize = 10)
plt.show()
grouped_pivot.plot(kind='box' ,figsize=(14, 5))
plt.title('Change in Price of Currencies Over the Course of Time')
plt.ylabel('Price')
plt.xlabel('Years')
plt.yticks(np.arange(0, 21000, 2500))
plt.show() # need this line to show the updates made to the figure
df2 = df[['Currency','Date','Volume']]
df2 = df2.groupby(['Currency','Date'],as_index=False).max()
df3 = df2[df2['Date'] > 2015]
df3.head()
Currency | Date | Volume | |
---|---|---|---|
3 | binance-coin | 2016 | 19773600 |
4 | binance-coin | 2017 | 1730780032 |
5 | binance-coin | 2018 | 637020992 |
6 | binance-coin | 2019 | 742382920 |
10 | bitcoin | 2016 | 363320992 |
df3_pivot = df3.pivot(index='Date',columns='Currency',values = 'Volume')
df3_pivot
Currency | binance-coin | bitcoin | bitcoin-cash | bitcoin-sv | cardano | eos | ethereum | litecoin | stellar | tether | tezos | xrp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||
2016 | 19773600 | 363320992 | 7399410 | 19773600 | 2369690 | 19773600 | 199408000 | 19773600 | 2369690 | 7399410 | 2369690 | 15967600 |
2017 | 1730780032 | 22197999616 | 11889600512 | 1730780032 | 645155968 | 1730780032 | 5179829760 | 6961679872 | 537916032 | 4687949824 | 280844992 | 8108389888 |
2018 | 637020992 | 23840899072 | 5377260032 | 637020992 | 1713769984 | 4870720000 | 9214950400 | 3481550080 | 1513270016 | 6967777735 | 15960800 | 9110439936 |
2019 | 742382920 | 45105733173 | 4522945333 | 1701098408 | 340512939 | 5394932035 | 18661465873 | 6442000276 | 837742777 | 53509128965 | 123832790 | 9415068271 |
df4 = df3[(df3['Currency'] == 'bitcoin') | (df3['Currency'] == 'ethereum') | (df3['Currency'] == 'tether')]
df4_pivot = df4.pivot(index='Date',columns='Currency',values = 'Volume')
df4_pivot
Currency | bitcoin | ethereum | tether |
---|---|---|---|
Date | |||
2016 | 363320992 | 199408000 | 7399410 |
2017 | 22197999616 | 5179829760 | 4687949824 |
2018 | 23840899072 | 9214950400 | 6967777735 |
2019 | 45105733173 | 18661465873 | 53509128965 |
df4_pivot.plot(kind='bar',figsize=(10, 5))
plt.title('Change in Volume of Top 3 Currencies from 2016-2019', fontsize = 15)
plt.ylabel('Volume')
plt.xlabel('Years')
plt.legend(fontsize = 10)
plt.show()
df4_pivot.plot(kind='line',figsize=(10, 5)) #Line Chart for
plt.title('Change in Volume of Top 3 Currencies from 2016-2019 ', fontsize = 15)
plt.ylabel('Volume')
plt.xlabel('Years')
plt.xticks(np.arange(2016, 2020, 1))
plt.legend(fontsize = 10)
plt.show()